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Why Does Zipf's Law Break Down in Rank-Size Distribution of Cities? 

Hiroto KUNINAKA * and Mitsugu Matsushita 
Department of Physics, Chuo University, Kasuga, Bunkyo-ku, Tokyo 112-8551 



00 

o 

O 

< 
in 

(N 



c : 
a : 

I 

H— > 

ctf ■ 

: 

o ■ 

CZ5 ■ 
>Y 

^ : 

Ph. 



CO 
> 
O 

oo 
CN 

CN 
O 
OO 

o 



We study rank-size distribution of cities in Japan on the basis of data analysis and computer 
simulation. From the census data after World War II, we find that the rank-size distribution 
of cities is composed of two parts, each of which has independent power exponent. In addition, 
the power exponent of the head part of the distribution changes in time and Zipf's law holds 
only in a restricted period. We show that Zipf's law broke down due to both of Showa and 
Heisei great mergers and recovered due to population growth in middle-sized cities after the 
great Showa merger. 

KEYWORDS: population, rank-size distribution, power-law distribution, lognormal distribution, Zipf's law, 
Gibrat's law, agent-based model 



1. Introduction 

Many empirical data which obey power-law distribu- 
tion can often be observed in both natural and social 
sciences. 1, 2 > For example, the size distribution of lunar 
craters, 3 ) the relation between frequency and magnitude 
of earthquakes, 4 ) the size distribution of islands 5 -* and the 
cumulative probability distribution of personal income in 
Japan 6 ) are known to obey power-law distribution. 

On the other hand, many researches have reported that 
empirical data obeying lognormal distribution are abun- 
dant around us. The lognormal distribution has the form 
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where a and T are dispersion and average, respectively. 
For example, the fragmentation of glass rods, 7 -* income 
distribution of families and single individuals in U.S., 8 -' 
the size distribution of barchan dunes, 9 ) food fragmenta- 
tion by chewing 10 ^ and the duration of disability for aged 
people 11 ) are believed to obey Eq.(l) approximately. 

The origin of those characteristic distributions is ex- 
plained by the random multiplicative process which is 
often used to mimic the growth process of living organ- 
isms. 12 ) Let Xi be a physical quantity at time step i and 
suppose that its growth process is governed by the fol- 
lowing relation: Xi — oii-\Xi-\. Here, on is the growth 
rate at the time step i. At the m-th step, X m can be 
written as X m — XqY\^Li^ where X is the initial 
quantity. Thus, when we assume that on is a random 
variable independent of Xi and m is sufficiently large, 
\ogXi obeys the normal distribution due to the central 
limit theorem, which entails the lognormal distribution 
of Xi. This process is often called Gibrat's process or 
Gibrat's law, which is so common to many complex sys- 
tems that one may say that the default distribution is 
the lognormal distribution. Indeed, if we introduce ad- 
ditional term, such as a noise term, into the Gibrat's 
process, Xi obeys power-law distributions. 13,14 ) 

This paper focuses on the population distribution of 
cities in Japan. The population distribution within a 
given region sometimes shows power-law behavior. Auer- 



bach firstly reported that the rank-size distribution for 
population of cities obeys a power-law distribution. 15 ) 
This means that when we order the cities by popula- 
tion and plot the rank R(x) against its corresponding 
population x, the relation between R(x) and x can be 
approximated by 



log R(x) = a — b log x, 



(2) 



where a and b are fitting parameters. As for other mu- 
nicipalities, for example, Sasaki et al. recently reported 
that the rank-size distributions of towns and villages can 
be well approximated by lognormal distributions while 
that of cities obeys power-law distribution in Japan. 16 ) 

Zipf reported that the power exponent b is approxi- 
mately 1 in the case of cities, so that the special case 
b = 1 is generally called Zipf's law. 17 ) Since many empir- 
ical data, such as the frequency of words in a literature 
and the income of companies, 18 ) obey Zipf's law, it is 
believed to be universal regularity. 

However, we can easily find that Zipf's law in popula- 
tion distribution of cities is not universal. 19 ) For exam- 
ple, Figs. 1 (a) and (b) show the rank-size distribution 
of top 300 cities of U.S.A. in 2002 and that of top 267 
cities of Brazil in 2006, respectively. Power exponents 
are b = 1.338 ± 0.004 and b = 1.230 ± 0.005, respec- 
tively, which are different from b — 1 predicted by Zipf's 
law, although the distributions obey power-law behav- 
ior. Here, we obtain those power exponents by the least 
square linear regression after plotting log of rank against 
log of population. In addition, we often find that the 
rank-size distribution does not exhibit power-law behav- 
ior. 19 ' Even if the rank-size distribution obeys power-law 
distribution, it is easily expected that power exponent 6 
changes in time due to migration, a change of birth rate, 
and so on. 

Some stochastic models have been proposed to ex- 
plain Zipf's law. For example, Simon's model explains 
the emergence of Zipf's law in the rank-frequency distri- 
bution of words in literature. 20 ) The point of this model 
is that, for adding a new word to a text, the probability 
that a word is repeated is proportional to the number 
of its previous occurrence. This model explains that the 
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rank- frequency distribution obeys power-law distribution 
with the exponent less than or equal to unity, which de- 
pends on the probability to choose the newly added word. 
On the other hand, Cancho has developed a model to 
explain the case that the exponent becomes larger than 
unity. 21 -* Recently, by modifying Simon's model, Zanette 
and Montemurro have developed a more realistic model 
to reproduce Zipf's law in the rank-frequency distribu- 
tion of words, which explains the exponents in some cases 
of different languages. 22 ) 

In this paper, we investigate the time evolution of the 
rank-size distribution of cities in Japan to show how the 
power exponent b changes after World War II. In addi- 
tion, we show that Zipf's law holds only in a restricted 
period to explain why Zipf's law breaks down in Japan 
from our results of data analysis and simulation. Our 
data analysis is based on the census data from 1950 to 
2006 which obtained from Statistics Bureau, Ministry 
of International Affairs and Communications, Japan, 23 ) 
and data book from Japan Statistical Association. 24 ) 

The organization of this paper is as follows. In the 
next section, we show our data analyses about the time 
evolution of the rank-size distribution for population of 
cities and the power exponent of its head part. Section 
3 is devoted to modelling of population migration to ex- 
plain the time evolution of the power exponent. In §4, we 
discuss our results of data analyses and simulation. The 
final section summarizes our results. 

2. Data Analyses 

Figure 2 shows the rank-size distributions for cities of 
Japan in 1950, 1960, 2000 and 2005, respectively. In each 
year, the rank-size distribution can be divided into two 
parts. For example, the distributions in 2000 and 2005 
are clearly divided into two parts around 5.0 x 10 5 in 
population. Thus, the head and the tail part of each dis- 
tribution can be fitted by discrete power-law distribution 
functions. 

The slopes of head parts of the rank-size distributions 
change significantly from 1950 to 1960. This is mainly 
due to the fact that the number of cities drastically in- 
creased from 248 to 565 in the the great Showa merger 
from 1955 to 1960. From 2000 to 2005, the slope of head 
parts slightly increases from 1.027±0.004to 1.080±0.004, 
although the two distributions globally seem to be simi- 
lar. Also in this case, the change of slopes of head parts 
is affected by the increase of the number of cities due 
to the great Heisei merger from 2000. Thus, the power 
exponent of the distribution changes easily by the great 
merger of municipalities. 

Next, wc investigate how the power exponent of head 
part of the rank-size distribution changed in time after 
World War II. Figure 3 shows the time evolution of the 
power exponent b from 1950 to 2006. Error bars which 
are almost invisible on data marks are standard deviation 
obtained by the least-squares linear regression. This fig- 
ure shows that the power exponent b drastically changes 
during the two great mergers both in Showa and Hei- 
sei era. After the great Showa merger finished in 1960, 
the power exponent b shows monotonic decrease and ap- 
proaches unity. The power exponent b keeps the value 
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Fig. 1. Rank-size distributions for population of cities (a) in 
U.S.A, 2002 and (b) in Brazil, 2006. 
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Fig. 2. Rank-size distribution of cities in Japan from 1950 to 
2005. 



near unity until the great Heisei merger starts in 2000. 
Thus, it is shown that Zipf's law holds only in the period 
from 1970 to 2000 in Japan. 

Here, we should comment on the fitting range to ob- 
tain 6. As we can see in Fig. 2, the range of the head 
part is not so large at each year. Thus, all the values in 
Fig. 3 are obtained by regression within about one order 
of magnitude. 

To explain the relaxation of b to unity, we investigate 
the time evolution for the growth rate of population. To 
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Fig. 3. Time evolution of power exponent b from 1950 to 2006. 

calculate the growth rate, at first, we categorize cities 
into some groups. The n-th group (2 < n < 4) is com- 
posed of 80 cities, the rank of which ranges from 80n— 139 
to 80n — 60 at a given year, while the first group (n = 1) 
is composed of 20 cities, the rank of which ranges from 1 
to 20. Note that the constituents of each group changes 
because the rank of cities usually changes at every census 
year. 

Wc define the growth rate P n (t) of the n-th group at 
a census year t as 

(20 (n = l) 
[80 (n = 2,3,4), 

where xf(t) is the population of the i-th city which be- 
longs to the n-th group, and At is taken as At = 5 
which is the interval between two successive census years 
in Japan. Figure 4 is the time evolution of the growth 
rate P n (t) of each group from 1960 to 2000. The growth 
rate of the first group shows global decrease, while those 
of other groups have apparent peaks in 1970 or 1975. 
This may be attributed to the following two factors: (i) 
the migration from the big cities to their satellite cities 
or the countryside, such as "U turn phenomena" or "I 
turn phenomena", which is remarkable after 1970 25 -' and 
(ii) population increase due to the second baby boom in 
the first half of the 1970s. 

From these results, we can understand the time evolu- 
tion of the power exponent b in Fig. 3 by the following 
scenario: 

(1) Before the great Showa merger starts, the power ex- 
ponent b has the value near unity. 

(2) Due to the increase of the number of cities by the 
great Showa merger, the power exponent b increases. 

(3) After the merger, under the circumstance that the 
increase of the number of cities is not so large, the 
population of cities whose ranks range from 20 to 
260 increases, which results in the decrease of b. 

(4) The power exponent b remains the value near unity 
until the great Heisei merger starts in 2000. 



Table I. Number of cities of each year. 



Year 


Number of cities 


1950 


254 


1960 


561 


1970 


588 


1980 


647 


1990 


656 



Here wc would like to comment on why the period in 
which Zipf's law held continued for about 30 years. The 
head part of the rank-size distribution of cities consists 
of the groups with n > 2. From Fig. 4, we can easily find 
that the growth rates of these groups have almost the 
same value after 1975. In addition, the number of cities 
showed slow increase after 1960, while it had shown fast 
increase between 1950 and 1960 23) (see Table. I). This 
may cause the stability of the power exponent after Zipf's 
law holds and prevent the exponent from taking the value 
less than unity. 

3. Modelling on Population Migration 

In this section, we construct a model for the popula- 
tion migration to reproduce the increase of b due to the 
merger of municipalities and its convergence to unity af- 
ter the merger. Our model is based on an agent-based 
model which consists of 3500 sites corresponding to all 
the municipalities. Each site has a uniform random num- 
ber between and 1 as the initial population. The basic 
procedure of one simulation step is summarized as fol- 
lows: 

(1) We randomly choose a source site m with the pop- 
ulation N m . 

(2) We choose a group of sites, GN<N m or GN>N m , 
which are the groups of the sites whose population N 
are less and more than N m , respectively. The proba- 
bility to choose Gjy<N m is a (migration parameter) 
while that to choose Gff>N m is 1 — a. 

(3) Among the group of sites chosen in the previous 
step, we randomly choose the destination site n for 
migration. 

(4) P mn percent of N m are transferred to the site n, so 
that the populations of sites m and n vary in quan- 
tity as N m -P mn N m and N n +P mn N m , respectively. 
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In the second step, the migration parameter a is in- 
troduced to describe the tendency that people migrate 
to less populated area from large cities which was evi- 
dent after the high economic growth from 1960 to early 
1970s. 25 ) In addition, 

Pmn is randomly chosen in the 
range from to 20. We iterate this procedure 10 6 times in 
our simulation. Sample average is taken over 10 different 
initial population distributions for all the sites. 

When the population of a given site becomes larger 
than 0.95, we regard the site as a city. Once a site is pro- 
moted to a city, the site will not be demoted to a smaller 
municipality such as towns and villages. This rule corre- 
sponds to a part of the Local Autonomy Law of Japan 
which says that municipalities must have a population of 
50,000 or more to be promoted to cities. 30 ) Our model 
does not distinguish between towns and cities. Thus, if a 
site does not belong to cities, we henceforth call the site 
as a "town". 

After the first migration of 10 6 simulation steps, we 
merge some municipalities according to the following pro- 
cedure. At first, we randomly choose two sites to merge 
among all the sites. When both of them are not cities, 
we merge them to produce a new city if the sum of those 
populations becomes larger than 0.95, while we merge 
them to produce a town if the sum is less than 0.95. On 
the other hand, when at least one site is a city, we merge 
those two sites with the probability (3 — 0.5 to become 
a new city. The probability (3 is introduced due to the 
fact that the frequency of the merger of towns was much 
larger than that of cities. We iterate this merging pro- 
cess until the number of cities increases by 77 on average 
rather than that when the first migration stage is fin- 
ished. In our model, the increase of the number of cities 
affects the power exponent after the merger. In general, 
the power exponent increases with the increase of the 
number of cities generated by the merger. 

4. Simulation Results 

At first, we investigate the convergence of the rank- 
size distribution of cities generated by our model. Figure 
5 shows the rank-size distributions of cities at 10 5 , 10 6 , 
and 10 7 simulation steps, respectively. To obtain these 
results, the value of a is fixed at a = 0.3. This figure 
shows that the rank-size distribution converges to the 
stationary power-law distribution with the power expo- 
nent 6 = 1.012±0.002. When the number of sites is more 
than 3500, our model needs longer simulation steps for 
the convergence to the power-law distribution with 6=1. 
Thus, our model can reproduce the power-law distribu- 
tion of cities which converges to Zipf's law. However, in 
our model, the number of cities keeps increasing after the 
power exponent becomes 6 = 1, which slightly increases 
the power exponent. 

Secondly, we investigate how the great merger affects 
the rank-size distributions of cities through the time evo- 
lution of the power exponent b. In this simulation, we 
carry out the first population migration of 10 6 simula- 
tion steps. After that, we merge some of those sites, fol- 
lowed by the second population migration of 7 x 10 5 sim- 
ulation steps. Figure 6 shows the time evolution of the 
rank-size distribution of cities. The dotted line shows the 
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Fig. 5. Time evolution of rank-size distribution of cities without 
merging process. 
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Fig. 6. Time evolution of rank-size distribution of cities with 
merging process. 

distribution after the first migration stage was finished. 
The solid line shows the distribution after a merger of 
200 sites. The open circles show the distribution after 
the second migration stage was finished, which can be 
fitted by the power-law distribution with the exponent 
b = 1.081 ± 0.001 denoted by the dash-dotted line. Here 
we find that the distribution approaches the power-law 
distribution with the exponent 6=1 after the merger. 

We show the relation between the power exponent 6 
and the simulation step in Fig. 7. Error bars which are 
almost invisible on a few data marks are standard devia- 
tion obtained by the least-squares linear regression. Data 
point at 10 6 steps shows the power exponent 6 after the 
merger has finished. We find that 6 converges to unity af- 
ter the increase of 6 due to the merger. Thus, our model 
can reproduce the time evolution of 6 qualitatively. 

Finally, we investigate how a affects the final distri- 
bution. Figure 8 shows the relation between a and the 
power exponent 6 at 10 6 simulation steps. The solid line 
is the regression line: 6 = 3.7a — 0.09. This result indi- 
cates that a determines the power exponent 6 of the fi- 
nal power-law distribution. For the convergence to Zipf's 
law, this model requires a = 0.3. 

Here we would like to comment on the effect of initial 
population distribution on the final distribution. When 
we give Ni = 1.0 for initial value of all the sites, the power 
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Fig. 7. Time evolution of power exponent b. 




reason why we focus ont— 1970 is that the convergence 
to Zipf's law can be seen in this period (Fig. 3). Here, we 
can run the regression, 

logP(1970) = 0.11 - (0.017 ± 0.007) loga;(1970), (4) 

which has a slight slope, although it is supposed to be- 
come if Gibrat's law holds. In addition, the dispersion 
of growth rate becomes rather large around log x(1970) = 
4.5, so that we cannot clearly see whether Gibrat's law 
holds or not. In the case of all municipalities, Sasaki et 
al. reported that the slope becomes almost from 2000 
to 2005, although it has a slight slope. 16 ' 
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Fig. 9. Relation between log of growth rate and log of population 
for cities from 1965 to 1970. Solid line is Eq.(4) 



Fig. 8. Relation between a and b. 

exponents of the resulting distributions do not show large 
difference. 

5. Discussion 

Let us discuss our results. From Fig. 3, we find that 
Zipf's law holds for 25 years and breaks down due to the 
great Heisei merger. Naturally arises a question whether 
Zipf's law held also before 1950. However, during World 
War II, the number and distribution of people must have 
shown large fluctuation due to the great air campaigns 
against large cities such as Tokyo and Osaka, and evacu- 
ations from large cities to countrysides. Under a circum- 
stance that the population distribution is unstable, it 
may be of little importance in discussing whether Zipf's 
law holds because there is a possibility that the distribu- 
tion no more obeys power-law one. 

We have found that the power exponent b approached 
unity after the great Showa merger had finished. Some 
theoretical explanations for the emergence of Zipf's law 
have been proposed in literature. 20 ' 26 ' 27 ) Among them, 
Gabaix showed that Gibrat's law in the population 
growth of each city is necessary for the emergence of 
Zipf's law. 26 ' 28 ) Here, Gibrat's law means that different 
cities grow randomly with the growth rate independent 
of the population of cities. We investigated the relation 
between the growth rate P{t) = x(t)/x(t — At) and the 
population x(t) for all cities at t = 1970 (Fig. 9). The 



As we referred in §1, a random multiplicative process 
with Gibrat's law generates lognormal distribution. The 
rank-size distribution for population in all municipali- 
ties shows the double-Pareto distribution, which consists 
of lognormal body with power-law tail. 14, 16, 29 ^ Thus, it 
is no wonder that the regression line for the relation be- 
tween the growth rate and the population has a non-zero 
slope. Because the rank-size distribution of cities is the 
tail part of that of municipalities, it may have a non-zero 
slope. Thus, in the case of population of cities, Gibrat's 
law may be just necessary condition for the emergence 
of Zipf's law. 

To obtain the power exponent b of rank-size distri- 
butions for cities, we adopt the least-squares linear re- 
gression to fit those distribution functions by Eq.(2). Al- 
though this method is used frequently in literature, it is 
known that the method has several problems. 2 ) To ob- 
tain more reliable estimate for b, other estimation meth- 
ods such as the maximum likelihood method 2 ' may be 
better. 

In §3, we have constructed the agent-based model to 
explain the emergence and the breakdown of Zipf's law 
in the rank-size distribution of cities. This model can re- 
produce Zipf's law which is observed in the process that 
many entities exchange physical quantities among them. 
We often find Zipf's law in some phenomena without an 
apparent exchange process such as word frequencies in 
literature and the relation between the frequency and the 
magnitude of earthquakes. However, even for these cases 
there may be some hidden exchange processes such as 
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words in/out of fashion and the accumulation/relaxation 
of the crust stress due to the plate tectonic movement. 
Moreover, exchange processes are almost universal in the 
economic world as well as social world. Hence we believe 
that the present model can be applicable to other prob- 
lems as well. 

In §4, we carried out a simulation of population migra- 
tion to explain the time evolution of the power exponent 
6 in Fig. 3. In Fig. 6, after the merger of municipalities, 
the rank-size distribution shifts towards upper direction 
in all the region. If we use the value of (3 smaller than 
0.5, the distribution shifts towards upper direction in the 
region whose population is less than about 1.8. Conse- 
quently, small value of (3 causes a decrease of the range in 
which the distribution can be fitted by a single power-law 
distribution. 

The rank-size distribution of all municipalities has a 
lognormal body and a power-law tail, 16 ) which is ob- 
served also in our simulation. This type of distribution 
can be observed in the agent-based simulation of ex- 
changing quantities on the small- world network, 31 ) which 
implies the possibility that the population migration net- 
work may have a small- world structure. To clarify the 
relevance, we need to analyse the population migration 
network between municipalities in detail. 

6. Concluding Remarks 

In conclusion, we have investigated the time evolution 
of the rank-size distribution for population of cities to 
show how the power exponent changes in time. The rank- 
size distribution shows that power-law behavior and the 
time evolution of the power exponent drastically changes 
when the great merger of municipalities occurs. After the 
great Showa merger finished, the power exponent con- 
verged to unity, which means that Zipf's law holds. We 
have explained the change of the power exponent by the 
growth rates of the categorized groups of cities in the 
point of view of migration. 
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